Collecting Player Game Logs

One of the primary modules in py-goldsberry is the player module. It provides access to a multitude of player-level statistics.

Each class in the player module requires a specific playerID. If you have looked through the first tutorial, you can see that py-goldsberry has a built-in function that makes it easy to find the playerIDs for a given season.


In [1]:
import goldsberry
import pandas as pd
goldsberry.__version__


Out[1]:
'1.0.1'

One of the many things you can do with py-goldsberry is generate a list of game logs for a single player or the entire league (depending on what you desire). This can be accomplished very easily using two built-in methods and a simple custom function.

First, we generate a list of players from the current season using the built-in PlayerList() function, and convert it to a Pandas DataFrame.


In [2]:
players = goldsberry.PlayerList()
players2015 = pd.DataFrame(players.players())
players2015.head()


Out[2]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
0 Quincy Acy Acy, Quincy 2012 Y 203112 quincy_acy 1 SAC Sacramento kings 1610612758 Kings 2015
1 Jordan Adams Adams, Jordan 2014 Y 203919 jordan_adams 1 MEM Memphis grizzlies 1610612763 Grizzlies 2015
2 Steven Adams Adams, Steven 2013 Y 203500 steven_adams 1 OKC Oklahoma City thunder 1610612760 Thunder 2015
3 Arron Afflalo Afflalo, Arron 2007 Y 201167 arron_afflalo 1 NYK New York knicks 1610612752 Knicks 2015
4 Alexis Ajinca Ajinca, Alexis 2008 Y 201582 alexis_ajinca 1 NOP New Orleans pelicans 1610612740 Pelicans 2015

When you have the data into a DataFrame, you can take advantage of the Pandas functionality to search for specific players, teams, rookie cohorts, etc...

Let's start by looking for just James Harden.


In [3]:
players2015.ix[players2015['DISPLAY_LAST_COMMA_FIRST'].str.contains("Harden")]


Out[3]:
DISPLAY_FIRST_LAST DISPLAY_LAST_COMMA_FIRST FROM_YEAR GAMES_PLAYED_FLAG PERSON_ID PLAYERCODE ROSTERSTATUS TEAM_ABBREVIATION TEAM_CITY TEAM_CODE TEAM_ID TEAM_NAME TO_YEAR
179 James Harden Harden, James 2009 Y 201935 james_harden 1 HOU Houston rockets 1610612745 Rockets 2015

Fortunately, there is only one player with Harden somewhere in his name. If we had searched for James, it would have been a bit of a different story.

Because we want to get information on James Harden, we need to make note of the value in his PERSON_ID column. This is the unique id number that is associated with Harden in the NBA database. Anytime we want to search for James Harden related information, this will be a value to remember.

To make it easy to remember, I'm going to save it as a variable in our environment that we can call it anytime we want. It's a bit easier for me to remember harden_id than 201935.


In [4]:
harden_id = '201935'

Game Logs

One of many pieces of available data for a player is their game logs. You can access these by using the goldsberry.player.game_logs() class and passing in the playerID.

There are a few variables that can be manipulated in the game_logs to adjust the data that gets returned. The most important is the season argument. When you instantiate the class, you must pass a valid player id. When the class loads, it automatically grabs all of the game logs for the player for the current season.


In [5]:
harden_game_logs = goldsberry.player.game_logs(harden_id)

Now that we've collected the data from the NBA website, we want to create a Pandas DataFrame to view an analyze.


In [6]:
harden_game_logs_2015 = pd.DataFrame(harden_game_logs.logs())

Notice that we passed harden_game_logs.logs() and not harden_game_logs to the DataFrame constructor. This is because, with many of the calls in py-Goldsberry, there are multiple sets of data returned. Instead of making multiple calls to the NBA's server, a single call is made and all of the data is store in the class. The various methods of the class provide access to the raw data.

(Until documentation is complete, take advantage of the [TAB] complete feature in jupyter.)


In [7]:
harden_game_logs_2015.head()


Out[7]:
AST BLK DREB FG3A FG3M FG3_PCT FGA FGM FG_PCT FTA ... PF PLUS_MINUS PTS Player_ID REB SEASON_ID STL TOV VIDEO_AVAILABLE WL
0 4 0 2 10 5 0.500 27 13 0.481 8 ... 3 21 38 201935 4 22015 0 1 1 W
1 6 0 1 7 5 0.714 21 12 0.571 7 ... 0 29 34 201935 1 22015 0 3 1 W
2 13 0 3 14 6 0.429 31 14 0.452 8 ... 2 23 40 201935 4 22015 4 4 1 W
3 7 0 2 6 5 0.833 14 10 0.714 7 ... 3 -4 30 201935 2 22015 2 3 1 L
4 4 2 5 10 4 0.400 22 8 0.364 7 ... 4 -1 26 201935 5 22015 1 6 1 L

5 rows × 27 columns

If you've found this helpful and/or have any other requests, shoot me an email bradley@cardinaladvising.com or post an issue on github